Large-scale Imputation for Complex Surveys
نویسندگان
چکیده
Much of the recent research into imputation methodology has focused on developing optimal procedures for a single variable or set of variables, where the patterns of missingness and underlying distributions follow standard distributions. In contrast, it is frequently necessary to impute for many variables from a single survey, with an even larger set of potential covariates and complex covariance structures among the variables to be imputed. Further, the imputations need to be completed in a relatively short time frame within a constrained budget. The analyst also is unlikely to be able to anticipate all of the important analyses for which the imputed data are to be used. This often prevents analysts from being able to produce optimal imputations for each variable. Instead, one tries to produce a set of imputed variables that minimize the attenuation of key relationships, hopefully reduces nonresponse bias, and satisfies the time and budgetary constraints.
منابع مشابه
Multiple Imputation in a Complex Sample Survey
Multiple imputation for missing survey data is relatively new concept. As defined by one of its leading proponents, "multiple imputation is the technique that replaces each missing or deficient value with two or more acceptable values representing a distribution of possibilities" (Rubin 1987, p.2). Multiply-imputed data reflects the uncertainty contained in the imputation process in a way not p...
متن کاملCombining synthetic data with subsampling to create public use microdata files for large scale surveys
To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants’ confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemi...
متن کاملNonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian, joint modeling approach to multiple imputation fo...
متن کاملMultiple imputation in a large-scale complex survey: a practical guide.
The Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium is a multisite, multimode, multiwave study of the quality and patterns of care delivered to population-based cohorts of newly diagnosed patients with lung and colorectal cancer. As is typical in observational studies, missing data are a serious concern for CanCORS, following complicated patterns that impose severe challenge...
متن کاملAnalysis of Large - Scale Social Surveys 1
Large-scale social surveys are an important source of information for a wide range of topics. In analyzing such surveys, it is important to be aware of the complexity of the sampling design and the data adjustments that are used by survey organizations, including weighting to adjust for diierences between sample and population and imputation to ll in missing responses. For estimating population...
متن کاملBayesian Multiple Imputation for Large-Scale Categorical Data with Structural Zeros
We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounting for structural zeros by conceiving of the observed data as a truncated sample from a hypothetical population without structural zeros. Thi...
متن کامل